Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes
نویسندگان
چکیده
It is well known that any finite state Markov decision process (MDP) has a deterministic memoryless policy that maximizes the discounted longterm expected reward. Hence for such MDPs the optimal control problem can be solved over the set of memoryless deterministic policies. In the case of partially observable Markov decision processes (POMDPs), where there is uncertainty about the world state, optimal policies must generally be stochastic, if no additional information is presented, e.g., the observation history. In the context of embodied artificial intelligence and systems design an agent’s policy underlies hard physical constraints and must be as efficient as possible. Having this in mind, we focus on memoryless POMDPs. We cast the optimization problem as a constrained linear optimization problem and develop a corresponding geometric framework. We show that any POMDP has an optimal memoryless policy of limited stochasticity, which means that we can give an upper bound to the number of deterministic policies that need to mixed to obtain an optimal stationary policy, regardless of the specific reward function.
منابع مشابه
A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملOptimal Control for Partially Observable Markov Decision Processes over an Infinite Horizon
In this paper we consider an optimal control problem for partially observable Markov decision processes with finite states, signals and actions OVE,r an infinite horizon. It is shown that there are €optimal piecewise·linear value functions and piecl~wise-constant policies which are simple. Simple means that there are only finitely many pieces, each of which is defined on a convex polyhedral set...
متن کاملLearning in non-stationary Partially Observable Markov Decision Processes
We study the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not perfectly known and may change over time. We present the algorithm MEDUSA+, which incrementally improves a POMDP model using selected queries, while still optimizing the reward. Empirical results show the response of the algorithm to changes in the parameters of a m...
متن کاملGood Policies for Partially-observable Markov Decision Processes Are Hard to Nd
Optimal policy computation in nite-horizon Markov decision processes is a classical problem in optimization with lots of pratical applications. For stationary policies and innnite horizon it is known to be solvable in polynomial time by linear programming, whereas for nite-horizon it is a longstanding open problem. We consider this problem for a slightly generalized model, namely partially-obse...
متن کاملOptimal control of infinite horizon partially observable decision processes modelled as generators of probabilistic regular languages
Decision processes with incomplete state feedback have been traditionally modelled as partially observable Markov decision processes. In this article, we present an alternative formulation based on probabilistic regular languages. The proposed approach generalises the recently reported work on language measure theoretic optimal control for perfectly observable situations and shows that such a f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1503.07206 شماره
صفحات -
تاریخ انتشار 2015